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Abstract 

We argue from the point of view of statistical inference that the quantum 
relative entropy is a good measure for distinguishing between two quantum 
states (or two classes of quantum states) described by density matrices. We 
extend this notion to describe the amount of entanglement between two quan- 
tum systems from a statistical point of view. Our measure is independent of 
the number of entangled systems and their dimensionality. 
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Recent work has taught us that Bell's inequalities are not always a good criteria for 
distinguishing entangled states (i.e. those possessing a degree of quantum correlations) from 
disentangled states [|I|]. This discovery has initiated much work in quantum information 
theory (e.g particularly concerning the search for a measure of the amount of entan- 

glement contained within a given quantum state [f§-|B|]. In a recent letter |J, we presented 
conditions that any measure of entanglement has to satisfy. This was motivated by the 
fact that local actions, combined only with classical communications, should not be able to 
increase the amount of entanglement In || we defined our measure as the minimal 

distance of an entangled state to the set of disentangled states. This distance function (not 
necessarily a metric) could, for example, be satisfied by the quantum relative entropy (to 
be defined later) and by the Bures metric (for definition see e.g. J7|). Our measure of en- 
tanglement was derived from the abstract idea of closest approximation rather than from 
intuitive physical grounds. In this paper we start from an entirely different point of view 
and derive a measure of entanglement from the idea of distinguishing two quantum states 
starting from classical information theory ||. We find that these new insights lead to the 
same measure of entanglement as in || (but now the quantum relative entropy is picked out 
from among the possible measures of "distance"). This corroborates the results of []6| and 
puts them on a firm statistical basis allowing experimental tests to determine the amount 
of entanglement. 

In order to understand our argument in the quantum case we must first describe its 
classical counterpart. Suppose that we are asked to distinguish between two probability 
distributions, taken for simplicity to be discrete. Say that we either have a fair coin with a 
"fifty-fifty" head-tail probability distribution, or an unfair coin with "seventy-thirty" head- 
tail probability distribution. We are allowed to toss a single coin N times and we want to 
know which one it is. To be more general, let us say that we have a dichotomic variable 
with the following distribution of probabilities: p(l) = p and p(0) = 1 — p. The probability 
that from N experiments (trials) we obtain n l's and (N — n) O's is given by the binomial 
distribution: 
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P N {n)= ( N )p n (l-p) N - n ■ (1) 



This can be written as 

P N (n) =exp{\nP N (n)} = exp{\n(^Jp n (l-p) N ~ n } . (2) 

However using the Stirling's approximation for large numbers the exponent can be consid- 
erably simplified: 

In p)p»(l - p)~-« = in £ + (1 - £) Ml - £) 

+ _lnp + (l--)ln(l-p)} (3) 

Now the quantity n/iV is our measured frequency of l's and likewise 1 — n/N is the mea- 
sured frequency of O's in N trials. The probabilities which we infer from this distribution 
are given by the maximum likelihood estimate |§: Pi n f(l) = n/N and p in /(0) = 1 — n/N . 
These are, in general, different to p and 1 —p. The crucial question we wish to ask, therefore, 
is: what is the probability that after N trials our inferred probabilities are q and 1 — q, if the 
experiment was done using a system having "true" probabilities p and 1 — pi In the light 
of the coin example we ask what is the probability of wrongly inferring that we have a fair 
coin when, in fact, the "seventy-thirty" unfair one was used in the experiments? Clearly 
the answer is given by replacing n/N by q in eq. (|3]). The result in the large N limit is 

Pn( P «) = e~ NS ^ p) , (4) 

where 

S(q\ \p) := {q In q + (1 — q) ln(l — q) — q hip — (1 — q) ln(l — p)} (5) 

is the so called relative entropy, or the Kullback-Leibler distance ]|,|],|]|§ between the binary 
distributions p and q. In general it is easy to see that the probability to confuse a distribution 
{p}* f with {q}^ 1 in N measurements is given by 

P N (p -+q) = e-^Ei* 1 ^-* 1 ^ . (6) 



As the relative entropy is an asymmetric quantity a natural question to ask is: why is the 
probability of confusing p with q different to the probability to confuse q with pi The follow- 
ing simple "coin" example will explain this. Suppose we have a fair coin and a completely 
unfair coin (two-heads for example). Suppose we have to decide which one it is, but we are 
allowed to do N experiments on only one, of course unknown-to-us, coin. So, say we are 
tossing the unfair coin. Then as heads is the only possible outcome, we will never confuse the 
unfair coin with the fair one, as after each trial the inferred probabilities will be p(head) = 1 
and p(tail) = 0. This is in fact corroborated by our formula in eq. (|]) as e~°° = 0. On the 
other hand, suppose we are tossing the fair coin: then after the first outcome which could 
equally be heads or tails we have probability of 1/2 of confusing the coins (i.e if the heads 
shows up we will make the wrong inference, whereas if the tail shows up it will be the right 
inference). This also follows from eq. @ as e _ln2 = 1/2 (note that here the formula is 
correct even for N small!). 

The central aim for us in this paper is to generalize this idea to distinguish (or, equiva- 
lently confuse) two quantum states which are completely described by their density matrices. 
To that end, suppose we have two states a and p. How can we distinguish them? We can 
chose a Positive Operator Valued Measure Z)i=i A* = 1 which generates two distributions 
via 



and use classical reasoning to distinguish these two distributions. However, the choice of 
POVM's is not unique. It is therefore best to choose that POVM which distinguishes the 
distributions most, i.e. for which the relative entropy is largest. Thus we arrive at the 
following quantity 



where the supremum is taken over all POVM's. The above is not the most general measure- 
ment that we can make, however. In general we have N copies of a and p in the state 
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Pi = trAiO 



(7) 



ft = trAip , 



(8) 




a N = a® a...® a (9) 

total of N terms 
p N = p®p...®p (10) 

total of N terms 

We may now apply a POVM J2i^i — 1 acting on a N and p N . Consequently, we define a 
new type of relative entropy 

S N {a\\p) := sup A , s {—J2trA i a N lntrA i a N -trA i a N lntrA i p N } (11) 



Now it can be shown that 10 



S(a\\p)>S N (12) 

where 

S{a\\p) := tr(a\na - alnp) (13) 

is the quantum relative entropy [p|,|6|,p|-p^1 (for the summary of the properties of quantum 
relative entropy see [13|]). Equality is achieved in eq. ( |1"2"D iff o and p commute fll^ ]. However, 
for any a and p it is true that [15| 



S(a\\p) = lim S N . 

N— >oo 

In fact, this limit can be achieved by projective measurements which are independent of o 
|T6| . From these considerations it would naturally follow that the probability of confusing 
two quantum states a and p (after performing N measurements on p) is (for large N): 

P N ( p _ a ) = e-^HIP) . (14) 

We would like to stress here that classical statistical reasoning applied to distinguishing 
quantum states leads to the above formula. There are, however, other approaches. Some 
take eq. (0) for their starting point and then derive the rest of the formalism thenceforth 
[IJ|. Others, on the other hand, assume a set of axioms that are necessary to be satisfied 



by the quantum analogue of the relative entropy (e.g. it should reduce to the classical 



relative entropy if the density operators commute, i.e. if they are "classical") and then 
derive eq. (|b|) as a consequence [TtJ. In any case, as we have argued here, there is a strong 
reason to believe that the quantum relative entropy 5(cr||p) plays the same role in quantum 
statistics as the classical relative entropy plays in classical statistics. A simple example 



with a "quantum coin" will clarify this point further 17]. Let us suppose that we have to 
distinguish between a pure, maximally entangled Bell state \<p + ) = ( 1 00) + 1 11)) /a/2 and 
a mixture p = ( 1 00) (00 1 + |ll)(ll|)/2. Again, we have to decide which state we have by 
performing N experiments of our choice on it. In this case we choose to perform projections 
onto the state = ( 1 00) + |ll))/\/2. Then if the state p is in our possession, we will be 
successful only 50 percent of the times (the other 50 percent of the times we will obtain the 
orthogonal Bell state |0 _ ) = ( 1 00) — |H))/v2). So, if we perform a single experiment we 
have a 1/2 chance of making the wrong inference. If, on the other hand, we have |0 + ) we 
will never confuse it with p since we are projecting onto the state itself which always gives a 
positive result. This is in direct analogy with the classical coin example and is, in addition, 
confirmed by eq. (|I3fl. In general, however, the states that we have to distinguish will not 
be as simple as those above. Then we would have to find the most optimal measurement to 
distinguish between given states in order to reproduce eq. ([TJ|) from eq. (|TTJ) . 

Now we wish to use the above reasoning to quantify entanglement. Entanglement may be 
understood as the distinguishability of a given state from all entirely disentangled 
ones. The question is then, in the spirit of the above discussion, as follows: what is the 
probability that we confuse a given state with a disentangled one after performing a total 
of N measurements? The less the state is entangled the easier it is to confuse it with a 
disentangled one and vice versa. Thus, the probability to confuse o with a disentangled 
state, having performed N experiments on p G T>, is of the form 

e- NE & , (15) 

where E(a) is the entanglement (obviously if E = then the state is indistinguishable from 
a disentangled one since it is disentangled itself!). In comparison with eq. ([14]) we define 



E{a) to be 



E{a):=rmnS{a\\p) (16) 

pel" 



where T> is the set of all disentangled states. So for the entanglement of a we use the 
quantum relative entropy with that disentangled p which is the most indistinguishable 
from a. Obviously, the greater the entanglement of a state, the smaller is the chance of 
confusing it with a disentangled state in N measurements. Note that eq. (0) is the same 
measure as that suggested in our previous letter || . There we showed that the Bures metric, 
when used instead of S(a\\p), would also be a good measure of entanglement. However, the 
Bures distance is a symmetric quantity and arises from different statistical consideration 
to those used above (see |7j for an overview). Thus, depending on the way we decide to 
make our measurements, we obtain different ways of comparing the results (i.e. different 
"distances" between probability distributions) which, in turn, determine our entanglement 
measure (more correctly, the quantity that is to replace S(cr\\p) in eq. (|T6|)). The convention 
that we use here assumes performing measurements on p. We could, of course, envisage 
making measurements on a, in which case our measure of entanglement would be E(a) := 
mm pe x) S(p\ | cr) . However, for a being, for example, a maximally entangled Bell state this 
quantity would be infinite. This agrees with our statistical interpretation that a Bell state, 
when measurements are performed on it, could never be confused with a disentangled state 
and and eq. (|1^) gives probability zero of confusion. But, in order to avoid dealing with 
physically undesired infinite amount of entanglement we keep to the convention given in eq. 

We see that the above treatment does not refer to the number (or indeed dimensionality) 
of the entangled systems. This is a desired property as it makes our measure of entanglement 
universal. However, in order to perform minimization in eq. ( p76| ) we need to be able to define 
what we mean by a disentangled state of say N particles. As pointed out in || we believe 
that this can be done inductively. Namely for two quantum systems, A x and A 2 , we define 
a disentangled state as one which can be written as a convex sum of disentangled states of 



A± and A 2 as follows [j^SyTB! : 



P12 = Y,PiPi®Pii ( 17 ) 

i 

where J2iPi — 1 an d the p's are all positive. Now, for iV entangled systems Ai,A 2 , ■■■An, 
the disentangled state is: 

P12..JV = Y. r hi2-i N P AllAl2 '" Aln ® /eAh-i'W"*" ; (18) 
perm{iii 2 ...i JV } 

where Eperm{ui 2 ... iJV } ^ li2 ...i N = 1, all r's are positive and where Epermfuia...^} is a sum 
over all possible permutations of the set of indices {1,2, N}. To clarify this let us see 
how this looks for 4 systems: 



Pl234 = EiPipf^ 


®pf" 


+ KP? lMM 


®p? 3 




+ r iP t lA3M 


®pt 2 


+ s lP ? M 


®pt 






p A 3 M 


+ Ui pf lM ® 


P A2M 






Pi 






(19) 



where, as usual, all the probabilities Pi,qi,...,Vi are positive and add up to unity. The 
above two equations, at least in principle, define the disentangled states for any number of 
entangled systems. In practice, unfortunately, this might still not be enough to minimize 
the relative entropy to obtain the amount of entanglement. So far a good criterion for 
decomposition into the above form exists for two particles only, when either both are spin 1/2 
or one is spin 1/2 and the other one is spin 1 P,|18|| (however, some progress has been made 
by P. Horodecki [|H5[]). The above definition of a disentangled state is justified by extending 
the idea that local actions cannot increase the entanglement between two quantum systems 
@-^|. In the case of N particles we have N parties (Alice, Bob, Charlie, ... , Wayne) all 
acting locally on their systems. The general action that also includes communications can 
be written as [|] 

P — > E^n ® B i2 ® - ® W iN p Al <g> fit ® ... ® W} N (20) 

21,12, ■■;lN 



and it can be easily seen that this action does not alter the form of a disentangled state 
in eqs. ( pl^| , |T9|) . In fact, eq. (|18| ) is the most general state invariant in form under the 
transformation given by eq. (|20|). We suggest this as a definition of a disentangled state for 
N > 3, i.e. it is the most general state invariant in form under local POVM and classical 
communications. This definition of N particle entanglement means that we say that we do 
not have N particle entaglement even if subsets of the N particles are individualy entangled. 
We define it this way so that it answers the question, are all N particles entangled, rather 
than the question, is there any entanglement at all between the particles. If we wanted 
to answer the latter question, then clearly the definition of a disentangled N particle state 
would be one that could be written as 

P = Y,PiPi®rf®---®pY- ( 21 ) 

i 

We have in this work derived our previously proposed measure of entanglement from 
an entirely different perspective. The amount of entanglement is now seen as the quantity 
that determines "the least number of measurements that is needed to distinguish a given 
state from a disentangled one". This therefore strengthens the argument for using eq. ([IB]) 
as a universal measure of entanglement. In addition, it opens up the possibility both to 
understand the meaning of entanglement from a different, more operational, point of view, 
as well as to measure the amount of entanglement for more than two quantum systems. 
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